Goto

Collaborating Authors

 lie detector


Preference Learning with Lie Detectors can Induce Honesty or Evasion

Cundy, Chris, Gleave, Adam

arXiv.org Artificial Intelligence

As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85\%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25\% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment.


Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Wang, Xuan, Liang, Siyuan, Liao, Dongping, Fang, Han, Liu, Aishan, Cao, Xiaochun, Lu, Yu-liang, Chang, Ee-Chien, Gao, Xitong

arXiv.org Artificial Intelligence

Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.


DNA links California man to 1979 cold case murder, years after passing lie detector

FOX News

Harvey Castro talks about how AI could be used in cold cases and the symbiotic relationship between AI and a detective. Riverside, California, investigators linked a man's DNA to a 1979 cold case murder of a teenage girl, years after the same man passed a lie detector test about the crime, according to authorities. The body of 17-year-old Esther Gonzalez was found dumped in packed snow off Highway 243 in Banning, California, in 1979, and after an investigation, detectives determined the teen had been raped and bludgeoned to death. Last week, the Riverside County District Attorney's Office said in a press release that the case had been solved using forensic genealogy, over 45 years later. On Nov. 20, the Riverside County Regional Cold Case Homicide Team identified Lewis Randolph "Randy" Williamson, who died in 2014, as the killer. NEWS ANCHOR'S MYSTERIOUS DISAPPEARANCE WAS CRIME OF'JEALOUSY': PRIVATE INVESTIGATOR Gonzalez was attacked and murdered on Feb. 9, 1979, as she was walking to her sister's house in Banning from her parent's house in Beaumont.


Detecting Rumor Veracity with Only Textual Information by Double-Channel Structure

Kim, Alex, Yoon, Sangwon

arXiv.org Artificial Intelligence

Kyle (1985) proposes two types of rumors: informed rumors which are based on some private information and uninformed rumors which are not based on any information (i.e. bluffing). Also, prior studies find that when people have credible source of information, they are likely to use a more confident textual tone in their spreading of rumors. Motivated by these theoretical findings, we propose a double-channel structure to determine the ex-ante veracity of rumors on social media. Our ultimate goal is to classify each rumor into true, false, or unverifiable category. We first assign each text into either certain (informed rumor) or uncertain (uninformed rumor) category. Then, we apply lie detection algorithm to informed rumors and thread-reply agreement detection algorithm to uninformed rumors. Using the dataset of SemEval 2019 Task 7, which requires ex-ante threefold classification (true, false, or unverifiable) of social media rumors, our model yields a macro-F1 score of 0.4027, outperforming all the baseline models and the second-place winner (Gorrell et al., 2019). Furthermore, we empirically validate that the double-channel structure outperforms single-channel structures which use either lie detection or agreement detection algorithm to all posts.


Ban racist and lethal AI from Europe's borders

Al Jazeera

The European Union is in the final stages of crafting first-of-its-kind legislation to regulate harmful uses of artificial intelligence. However, as it currently stands, the proposed law, called the EU AI Act, contains a lethal blind spot: It does not ban the many harmful and dangerous uses of AI systems in the context of immigration enforcement. We, a coalition of human rights organisations, call on EU lawmakers to make sure that this landmark legislation protects everyone, including asylum seekers and others on the move at Europe's borders from dangerous and racist surveillance technologies. We call on them to ensure AI technologies are used to #ProtectNotSurveil. Europe's borders are becoming deadlier with each passing day.


The Fight Over Which Uses of Artificial Intelligence Europe Should Outlaw

#artificialintelligence

The system, called iBorderCtrl, analyzed facial movements to attempt to spot signs a person was lying to a border agent. The trial was propelled by nearly $5 million in European Union research funding, and almost 20 years of research at Manchester Metropolitan University, in the UK. This content can also be viewed on the site it originates from. Polygraphs and other technologies built to detect lies from physical attributes have been widely declared unreliable by psychologists. Soon, errors were reported from iBorderCtrl, too.


People who talk to AIs often believe they're sentient

#artificialintelligence

In brief Numerous people start to believe they're interacting with something sentient when they talk to AI chatbots, according to the CEO of Replika, an app that allows users to design their own virtual companions. People can customize how their chatbots look and pay for extra features like certain personality traits on Replika. Millions have downloaded the app and many chat regularly to their made-up bots. Some even begin to think their digital pals are real entities that are sentient. "We're not talking about crazy people or people who are hallucinating or having delusions," the company's founder and CEO, Eugenia Kuyda, told Reuters.


The Fight Over Which Uses of AI Europe Should Outlaw

#artificialintelligence

The system, called iBorderCtrl, analyzed facial movements to attempt to spot signs a person was lying to a border agent. The trial was propelled by nearly $5 million in European Union research funding, and almost 20 years of at Manchester Metropolitan University, in the UK. Polygraphs and other technologies built to detect lies from physical attributes have been widely declared unreliable by psychologists. Soon, errors were reported from iBorderCtrl, too. Media reports indicated that its [lie-prediction algorithm didn't and the project's own website that the technology "may imply risks for fundamental human rights."


The Fight Over Which Uses of AI Europe Should Outlaw

WIRED

The system, called iBorderCtrl, analyzed facial movements to attempt to spot signs a person was lying to a border agent. The trial was propelled by nearly $5 million in European Union research funding, and almost 20 years of research at Manchester Metropolitan University, in the UK. Polygraphs and other technologies built to detect lies from physical attributes have been widely declared unreliable by psychologists. Soon, errors were reported from iBorderCtrl, too. Media reports indicated that its lie-prediction algorithm didn't work, and the project's own website acknowledged that the technology "may imply risks for fundamental human rights."


Scientists discover lie detector that uses artificial intelligence to detect micro-expressions

#artificialintelligence

In this file photo, a representative image of Artificial Intelligence can be seen. Scientists have discovered a new lie detector that can read facial muscles that people won't even know they are using. The study, conducted by the researchers at Tel Aviv University, has been in'Brain and Behaviour.' It was conducted on the basis of micro-expressions that vanish in 40 to 60 milliseconds due to which accuracy and speed played a key role. Also read Experts look to recycle dangerous space junk into rocket fuel in Earth's orbit ''Since this was an initial study, the lie itself was very simple,'' he added.